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(54) Vertex cache for 3D computer graphics 

(57) In a 3D interactive computer graphics system 
such as a video game display system, polygon vertex 
data is fed to the display engine via a vertex cache used 
to cache and organize indexed primitive vertex data 
streams. The vertex cache may be a small, low-latency 
cache memory local to the display engine hardware. 
Polygons can be represented as indexed arrays, e.g., 
indexed linear lists of data components representing 
some feature of a vertex (for example, positions, colors, 
surface normals, or texture coordinates). The vertex 
cache can fetch the relevant blocks of indexed vertex 
attribute data on an as-needed basis to make it availa- 
ble to the display processor - providing spatial locality 
for display processing without requiring the vertex data 
to be prestored in display order. Efficiency can be 
increased by customizing and optimizing the vertex 
cache and associated tags for the purpose of delivering 
vertices to the graphics engine - allowing more efficient 
prefetching and assembling of vertices than might be 
possible using a general-purpose cache and tag struc- 
ture. 
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Description 

FIFIP^FTHF invention 

[0001] The present invention relates to 3D interac- 
tive computer graphics, and more specifically, to 
arrangements and techniques for efficiently represent- 
ing and storing vertex information for animation and dis- 
play processing. Still more particularly, the invention 
relates to a 3D graphics integrated circuit Including a 
vertex cache for more efficient imaging of 3D polygon 
data. 

RarKRRQUNP W NUMMARY OF THE INVENt 
TION 



[00021 Modern 3D computer graphics systems con- 
struct animated displays from display primitives. I.e., 
polygons. Each display object (e.g., a tree, a car or a 
person or other character) is typically constructed from 
a number of individual polygons. Each polygon is repre- 
sented by its vertices -- which together specify the loca- 
tion, orientation and size of the polygon in three- 
dimensional space -along with other characteristics 
(e g color, surface normals for shading, textures, etc.). 
Computer techniques can efficiently construct rich ani- 
mated 3D graphical scenes using these techniques. 
[00031 Low cost high speed interactive 3D graphics 
systems such as video game systems are constrained 
in terms of memory and processing resources. There- 
fore, in such systems it is important to be able to effi- 
ciently represent and process the various polygons 
representing a display object. For example, it is desira- 
ble to make the data representing the display object 
compact, and to present the data to the 3D graphics 
system in a way so that all of the data needed for a par- 
ticular task is conveniently available. 
[00041 One can characterize data in terms of tem- 
poral locality and spatial locality. Temporal locality 
means the same data is being referenced frequently in 
a small amount of time. In general, the polygon-repre- 
senting data for typical 3D interactive graphics applica- 
tions has a large degree of temporal locality. Spatial 
locality means that the next data item referenced is 
stored close in memory to the last one referenced. Effi- 
ciency improvements can be realized by increasing the 
data's spatial locality. In a practical memory system that 
does not allow unlimited low-latency random access to 
an unlimited amount of data, performance is increased 
if all data needed to perform a given task is stored close 
together in low-latency memory. 
[00051 To increase the spatial locality of the data 
one can sort the polygon data based on the order of 
processing - assuring that all of the data needed to per- 
form a particular task will be presented at close to the 
same time so it can be stored together. For example, 
polygon data making up animations can be sorted in a 
way that is preferential to the type of animation being 



performed. As one example, typical complex Interactive 
real-time animation such as surface deformation 
requires manipulation of all the vertices at the surface^ 
To perform such animation efficiently, It » desirable to 
5 sort the vertex data in a certain way. 

[00061 Typical 3D graphical systems perform ani- 
mation processing and display processing separately, 
and these separate steps process the data djerentty. 
Unfortunately, the optima, order to sort tiie vertex data 
to for animation processing is generally 

optimal sort orderfor display processing. Sorting for ani- 
mation may tend to add randomness to display order- 
ing. By sorting a data stream to simplify animation 
processing, we make it harder to efficiently display the 

[00071 Thus, for various reasons, it may not be pos- 
sible to assume that spatial locality exists when access- 
ing data for display. Difficulty arises from the need to 
efficiently access an arbitrarily large display object In 
20 addition, forthe reasons explained above, there will typ- 
ically be some amount of randomness - at least for dis- 
play purposes - in the order the vertex data as 
presented to the display engine. Furthermore, here 
may be other data locality above the vertex level that 
25 would be useful to implement (e.g., grouping together 
all polygons that share a certain texture). 
[0008] One approach to achieving higher efficiency 
is to provide additional low-latency memory (e.g.. the 
lowest latency memory system affordable). It might also 
30 be possible to fit a display object in fast local memory to 
achieve random access. However, objects canl be quite 
large, and may need to be double-buffered. Therefore^ 
the buffers required tor such an approach could be very 
large. It might also be possible to use a main CPUs 
35 data cache to assemble and sort the polygon data in an 
optimal order for the display engine. However, to do this 
effectively, there would have to be some way to prevent 
the polygon data from thrashing the rest of the data 
cache. In addition, there would be a need to prefetch the 
40 data to hide memory latency - since there will probably 
be some randomness in the way even data sorted for 
display order is accessed. Additionally, this approach 
would place additional loading on the CPU - especially 
since there might be a need in certain implementations 
45 to assemble the data in a binary format the display 
engine can interpret Using this approach, the mam 
CPU and the display engine would become serial, with 
the CPU feeding the data directly to the graphics 
engine. Parallelizing the processing (e.g., to feed the 
so display engine through a DRAM FIFO buffer) would 
require substantial additional memory access band- 
width as compared to immediate-mode feeding, 
roooo] Thus, there exists a need for more efficient 
techniques that can be used to represent store and 
55 deliver polygon data for a 3D graphics display process. 
[0010] The present invention solves this problem by 
providing a vertex cache to organize indexed primitive 
vertex data streams. 
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[0011] In accordance with one aspect provided by 
the present invention, polygon vertex data is fed to the 
display engine via a vertex cache. The vertex cache 
may be a small, low-latency memory that is local to 
(e.g., part of) display engine hardware. Flexibility and 5 
efficiency results from the cache providing a virtual 
memory view much larger than the actual cache con- 
tents. 

[0012] The vertex cache may be used to build up 
the vertex data needed for display processing on the fly 
on an as-needed basis. Thus, rather than pre-sorting 
the vertex data for display purposes, the vertex cache 
can simply fetch the relevant blocks of data on an as- 
needed basis to make it available to the display proces- 
sor. We have found that based on the high degree of 
temporal locality exhibited by the vertex data for interac- 
tive video game display and the use of particularly opti- 
mal indexed-array data structures (see below), most of 
the vertex data needed at any given time will be availa- 
ble in even a small set-associative vertex cache having 
a number of cache lines proportional to the number of 
vertex data streams. One example optimum arrange- 
ment provides a 512 x 128-bit dual ported RAM to form 
an 8 set-associative vertex cache. 
[0013] Efficiency can be increased by customizing 
and optimizing the vertex cache and associated tags for 
the purpose of delivering vertices to the graphics engine 
-allowing more efficient prefetching and assembling of 
vertices than^mlght be possible using a general-pur- 
pose cache and tag structure. Because the vertex 
cache allows data to be fed directly to the display 
engine, the cost of additional memory access band- 
width is avoided. Direct memory access may be used to 
efficiently transfer vertex data into the vertex cache. 
[0014] To further increase the efficiencies afforded 
by the vertex cache, it is desirable to reduce the need to 
completely re-specify a particular polygon or set of pol- 
ygons each time it is (they are) used. In accordance with 
a further aspect provided by the present invention, poly- 
gons can be represented as arrays, e.g., linear lists of 
data components representing some feature of a vertex 
(for example, positions, colors, surface normals, or tex- 
ture coordinates). Each display object may be repre- 
sented as a collection of such arrays along with various 
sets of indices. The indices reference the arrays for a 
particular animation or display purpose. By represent- 
ing polygon data as indexed component lists, disconti- 
nuities are allowed between mappings. Further, 
separating out individual components allows data to be 
stored more compactly (e.g., in a fully compressed for- 
mat). The vertex cache provided by the present inven- 
tion can accommodate streams of such indexed data up 
to the index size. 

[001 5] Through use of an indexed vertex represen- 
tation in conjunction with the vertex cache, there is no 
need to provide any resorting for display purposes. For 
example, the vertex data may be presented to the dis- 
play engine in a order presorted for animation as 



opposed to display - making animation a more efficient 
process. The vertex cache uses the indexed vertex data 
structure representation to efficiently make the vertex 
data available to the display engine without any need for 
explicit resorting. 

[0016] Any vertex component can be index-refer- 
enced or directly inlined in the command stream. This 
enables efficient data processing by the main processor 
without requiring the main processor's output to con- 
form to the graphics display data structure. For exam- 
ple, lighting operations performed by the main 
processor may generate only a color array from a list of 
normals and positions by loop-processing a list of light- 
ing parameters to generate the color array. There is no 
need for the animation process to follow a triangle list 
display data structure, nor does the animation process 
need to reformat the data for display. The display proc- 
ess can naturally consume the data provided by the ani- 
mation process without adding substantial data 
reformatting overhead to the animation process. 
[0017] On the other hand, there is no penalty for 
sorting the vertex data in display order; the vertex data 
is efficiently presented to the display engine in either 
case, without the vertex cache significantly degrading 
performance vis-a-vis a vertex presentation structure 
optimized for presenting data presorted for display. 
[0018] In accordance with a further aspect provided 
by this invention, the vertex data includes quantized, 
compressed data streams in any of several different for- 
mats (e.g., 8-bit fixed point, 1 6-bit fixed point, or floating 
point). This data can be indexed (i.e., referenced by the 
vertex data stream) or direct (i.e., contained within the 
stream itself). These various data formats can all be 
stored in the common vertex cache, and subsequently 
decompressed and converted into a common format for 
the graphics display pipeline. Such hardware support of 
flexible types, formats and numbers of attributes as 
either immediate or indexed input data avoids complex 
and time-consuming software data conversion. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0019] These and other features and advantages 
provided by the present invention will be better and 
more completely understood by referring to the follow- 
ing detailed description of preferred embodiments in 
conjunction with the drawings of which: 

Figure 1 is a block diagram of an example interac- 
tive 3D graphics system; 

Figure 1 A is a block diagram of the example graph- 
ics and audio coprocessor shown in Figure 1 ; 
Figure 1B is a more detailed schematic diagram 'of 
portions of the Figure 1A graphics and audio 
coprocessor showing an example 3D pipeline 
graphics processing arrangement; 
Figure 2 shows an example command processor 
including a vertex cache provided with vertex index 
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array data; 

Figure 2A shows an example display list processor 
including a vertex cache provided in accordance 
with the present invention; 

Figure 2B shows an example dual FIFO arrange- 5 
ment; 

Figure 3 is a schematic diagram of an example 
indexed vertex data structure; 
Figure 3A shows an example vertex descriptor 

block; ^ 10 

Figure 4 is a block diagram of an example vertex 
cache implementation; 

Figure 5 shows an example vertex cache memory 
address format; and 

Figure 6 shows an example vertex cache tag status ?5 
register format. 

nFTAH ED DESCRIPTION OF PRFSF.NTLY PRE- 
cpp orn EXAMPI F EMBO DIMENTS 

20 

[0020] Figure 1 1s a schematic diagram of an overall 
example interactive 3D computer graphics system 100 
in which the present invention may be practiced. Sys- 
tem 100 can be used to play interactive 3D video games 
accompanied by interesting stereo sound. Different 25 
games can be played by inserting appropriate storage 
media such as optical disks into an optical disk player 
1 34 A game player can interact with system 1 00 in real 
time by manipulating input devices such as handheld 
controllers 132, which may include a variety of controls so 
such as joysticks, buttons, switches, keyboards or key- 
pads, etc. 

[0021] System 100 includes a main processor 
(CPU) 102, a main memory 104, and a graphics and 
audio coprocessor 1 06. In this example, main processor 35 
102 receives inputs from handheld controllers 132 
(and/or other input devices) via coprocessor 100. Main 
processor 102 interactively responds to such user 
inputs, and executes a video game or other graphics 
program supplied, for example, by external storage 134. 40 
For example, main processor 102 can perform collision 
detection and animation processing in addition to a vari- 
ety of real time interactive control functions. 
[0022] Main processor 102 generates 3D graphics 
and audio commands and sends them to graphics and 45 
audio coprocessor 106. The graphics and audio coproc- 
essor 106 processes these commands to generate 
interesting visual images on a display 136 and stereo 
sounds on stereo loudspeakers 137R, 137L or other 
suitable sound-generating devices. 50 
[0023] System 1 00 includes a TV encoder 1 40 that 
receives image signals from coprocessor 100 and con- 
verts the image signals into composite video signals 
suitable for display on a standard display device 136 
(e g a computer monitor or home color television set). 55 
System 100 also includes an audio codec (compres- 
sor/decompressor) 138 that compresses and decom- 
presses digitized audio signals (and may also convert 



between digital and analog audio signalling formats). 
Audio codec 138 can receive audio inputs via a buffer 
140 and provide them to coprocessor 106 for process- 
ing (e g., mixing with other audio signals the coproces- 
sor generates and/or receives via a streaming audio 
output of optical disk device 134). Coprocessor 106 
stores audio related information in a memory 144 that is 
dedicated to audio tasks. Coprocessor 1 06 provides the 
resulting audio output signals to audio codec 138 for 
decompression and conversion to analog signals (e.g., 
via buffer amplifiers 142L, 142R) so they can be played 
by speakers 137L, 137R. 

[0024] Coprocessor 1 06 has the ability to communi- 
cate with various peripherals that may be present within 
system 1 00. For example, a parallel digital bus 146 may 
be used to communicate with optical disk device 134. A 
serial peripheral bus 148 may communicate with a van- 
ety of peripherals including, for example, a ROM and/or 
real time clock 150, a modem 152, and flash memory 
154. A further external serial bus 156 may be used to 
communicate with additional expansion memory 158 
(e.g., a memory card). 



«« p hin« And Audio Coprocessor 

[0025] Figure 1 A is a block diagram of components 
within coprocessor 1 06. Coprocessor 1 06 may be a sin- 
gle integrated circuit. In this example, coprocessor 106 
Includes a 3D graphics processor 107, a processor 
Interface 108, a memory interface 110. an audio digital 
signal processor (DSP) 1 62. an audio memory interface 
(l/F) 1 64, an audio interface and mixer 1 66, a penpheral 
controller 168, and a display controller 128. 
[0026] 3D graphics processor 1 07 performs graph- 
ics processing tasks, and audio digital signal processor 
162 performs audio processing tasks. Display controller 
128 accesses image information from memory 104 and 
provides it to TV encoder 140 for display on display 
device 136. Audio interface and mixer 166 interfaces 
with audio codec 138, and can also mix audio from dif- 
ferent sources (e.g., a streaming audio input from disk 
134 the output of audio DSP 162. and external audio 
input received via audio codec 138). Processor inter- 
face 108 provides a data and control interface between 
main processor 102 and coprocessor 106. Memory 
interface 110 provides a data and control interface 
between coprocessor 106 and memory 104. In this 
example, main processor 102 accesses main memory 
104 via processor interface 108 and memory controller 
1 1 0 that are part of coprocessor 1 06. Peripheral control- 
ler 168 provides a data and control interface between 
coprocessor 1 06 and the various peripherals mentioned 
above (e.g., optical disk device 134, controllers 132. 
ROM and/or real time clock 150. modem 152. flash 
memory 154. and memory card 158). Audio memory 
interface 164 provides an interface with audio memory 
144. 

[0027] Figure 1 B shows a more detailed view of 3D 
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graphics processor 107 and associated components 
within coprocessor 106. 3D graphics processor 107 
includes a command processor 114 and a 3D graphics 
pipeline 116. Main processor 102 communicates 
streams of graphics data (i.e., display lists) to command 5 
processor 114. Command processor 114 receives 
these display commands and parses them (obtaining 
any additional data necessary to process them from 
memory 104), and provides a stream of vertex com- 
mands to graphics pipeline 116 for 3D processing and w 
rendering. Graphics pipeline 1 1 6 generates a 3D image 
based on these commands. The resulting image infor- 
mation may be transferred to main memory 104 for 
access by display controller 128 - which displays the 
frame buffer output of pipeline 1 1 6 on display 1 36. 15 
[0028] In more detail, main processor 102 may 
store display lists in main memory 1 04, and pass point- 
ers to command processor 114 via bus interface 108. 
The command processor 114 (which includes a vertex 
cache 212 discussed in detail below) fetches the com- 20 
mand stream from CPU 102, fetches vertex attributes 
from the command stream and/or from vertex arrays in 
memory, converts attribute types to floating point for- 
mat, and passes the resulting complete vertex polygon 
data to the graphics pipeline 1 16 for rendering/rasteri- 25 
zation. As explained in more detail below, vertex data 
can come directly from the command stream, and/or 
from a vertex array in memory where each attribute is 
stored in its own linear array. A memory arbitration cir- 
cuitry 130 arbitrates memory access between graphics 30 
pipeline 1 1 6, command processor 114 and display unit 
128. As explained below, an on-chip 8-way set-associa- 
tive vertex cache 212 is.used to reduce vertex attribute 
access^latency. 

[0029] As shown in Figure 1B, graphics pipeline 35 
116 may include transform unit 118, a setup/rasterizer 
120, a texture unit 122, a texture environment unit 124 
and a pixel engine 126. In graphics pipeline 116, trans- 
form unit 118 performs a variety of 3D transform opera- 
tions, and may also perform lighting and texture effects. 40 
For example, transform unit 118 transforms incoming 
geometry per vertex from object space to screen space; 
transforms incoming texture coordinates and computes 
projective texture coordinates; performs polygon clip- 
ping; performs per vertex lighting computations; and 45 
performs bump mapping texture coordinate generation. 
Set up/rasterizer .120 includes a set up unit which 
receives vertex data from the transform unit 118 and 
sends triangle set up information to rasterizers perform- 
ing edge rasterization, texture coordinate rasterization so 
and color rasterization. Texture unit 122 performs vari- 
ous tasks related to texturing, including multi-texture 
handling, post-cache texture decompression, texture fil- 
tering, embossed bump mapping, shadows and lighting 
through the use of projective textures, and BLIT with ss 
alpha transparency and depth. Texture unit 122 outputs 
filtered texture values to the texture environment unit 
124. Texture environment unit 124 blends the polygon 



color and texture color together, performing texture fog 
and other environment-related functions. Pixel engine 
126 performs z buffering and blending, and stores data 
into an on-chip frame buffer memory. 
[0030] Thus, graphics pipeline 1 1 6 may include one 
or more embedded DRAM memories (not shown) to 
store frame buffer and/or texture information locally. The 
on-chip frame buffer is periodically written to main mem- 
ory 1 04 for access by display unit 128. The frame buffer 
output of graphics pipeline 116 (which is ultimately 
stored in main memory 104) is read each frame by dis- 
play unit 128. Display unit 128 provides digital RGB 
pixel values for display on display 136. 

Vertex Cache And Vertex Index Array 

[0031] Figure 2 is a schematic illustration of com- 
mand processor 114 including a vertex cache 212 and 
a display list processor 213. Command processor 114 
handles a wide range of vertex and primitive data struc- 
tures, from a single stream of vertex data containing 
position, normal, texture coordinates and colors to fully 
indexed arrays. Any vertex component can be index-ref- 
erenced or directly inlined in the command stream. 
Command processor 114 thus supports flexible types, 
formats and numbers of attributes as either immediate 
or indexed data. 

[0032] Display list processor 213 within command 
processor 114 processes display list commands pro- 
vided by CPU 102 -- typically via a buffer allocated 
within main memory 104. Vertex cache 212 caches 
indexed polygon vertex data structures such as the 
example data structure 300 shown in Figure 2. Example 
indexed polygon vertex data structure 300 may include 
a vertex index array 304 which references a number of 
vertex component data arrays (e.g., a color data array 
306a, a texture vertex data array 306b, a surface normal 
data array 306c, a position vertex data array 306d, and 
so on). Vertex cache 21 2 accesses the vertex data from 
these arrays 306 in main memory 104, and caches 
them for fast access and use by display list processor 
213. 

Display List Processor 

[0033] Figure 2A shows example display list proc- 
essor 213 performed by command processor 114. In 
this Figure 2A example, display list processor 213 pro- 
vides several stages of parsing. Display list commands 
received from main processor 102 are interpreted by a 
display list stream parser 200. Display list stream parser 
200 may use an address stack 202 to provide nesting of 
instructions - or dual FIFOS may be used to store a 
stream of vertex commands from a FIFO in main mem- 
ory 1 06 to allow subroutine branching in instancing (see 
Figure 2B) without need for reloading prefetched vertex 
command data. Using the Figure 2B approach, the dis- 
play list commands may thus provide for a one-level- 
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deep display list - where the top level command stream 
can call the display list one level deep. This "call" capa- 
bility is useful for pre-computed commands and instanc- 
ing in geometry. 

[0034] Display list stream parser 200 routes com- 
mands that affect the state of graphics pipeline 1 16 to 
the graphics pipeline. The remaining primitive command 
stream is parsed by a primitive stream parser 204 
based on a primitive descriptor obtained from memory 
104 (see below). 

[00351 The indices to vertices are de-referenced 
and parsed by a vertex stream parser 208 based on a 
vertex descriptor 306 which may be provided in a table 
in hardware. The vertex stream provided to vertex 
stream parser 208 may include such indices to vertex 
data stored within main memory 104. Vertex stream 
parser 208 can access this vertex data from mam mem- 
ory 1 04 via vertex cache 212 -thus separately proving 
the vertex commands and associated referenced vertex 
attributes via different paths in the case of indexed as 
opposed to direct data. In one example, vertex stream 
parser 208 addresses vertex cache 21 2 as If it were the 
entirety of main memory 104. Vertex cache 212. in tarn 
retrieves (and often times, may prefetch) vertex data 
from main memory 104, and caches it temporanly for 
use by vertex stream parser 208. Caching the vertex 
data in vertex cache 212 reduces the number of 
accesses to main memory 104 - and thus the main 
memory bandwidth required by command processor 

[00361 Vertex stream parser 208 provides data for 
each vertex to be rendered within each triangle (poty- 
qon) This per-vertex data is provided, along with the 
per-primitive data outputted by primitive stream parser 
204 to a decompression/inverse quantizer block 214. 
Inverse quantizer 214 converts different vertex repre- 
sentations (e.g., 8-bit and 16-bitf.xed point format date) 
to a uniform floating-point representation used by 
graphics pipeline 116. Inverse quantizer 214 provides 
hardware support for a flexible variety of differences, 
formats and numbers of attributes, and such data can 
be presented to d.splay list processor 213 as erther 
immediate or indexed input data. The uniform floating- 
point representation output of inverse quantizer 214 is 
provided to graphics pipeline 116 for rasterization and 
f urther processing. If desired as an opt! mizat ton. i a fur- 
ther small cache or buffer may be provided at the oufout 
of inveree quantizer 214 to avoid the need to re-trans- 
form vertex strip data. 

VgrjeB '"dex Array 

[0037] Figure 3 shows a more detailed example of 
an indexed vertex list 300 of the preferred embodiment 
used to provide indirect (i.e., indexed) vertex attribute 
data via vertex cache 212. This generalization indexed 
vertex list 300 may be used to define primitives in the 
system shown in Figure 1. Each primitive is descnbed 



by a list of Indices, each of which indexes into an array 
of vertices. Vertices and primitives each use fomiat 
descriptors to define the types of their items. These 
descriptors associate an attribute with a type. An 
attribute is a date item that has a specific meaning to the 
rendering hardware. This affords the possibility of pro- 
gramming the hardware with descriptors so it can parse 
and convert the vertex/primitive stream as it is loaded. 
Using the minimum size type and the minimum number 
of attributes per vertex leads to geometry compression. 
The Figure 3 arrangement also allows attributes to be 
associated with the vertices, the indices, or the primi- 
tive, as desired. 

[00381 Thus, in the Figure 3 example indexed vertex 

. 4nn = nrimjth/e list 302 defines each of thevanous 

primitives (e r .g., triangles) in the date strearr , (e £. 
primO, priml. prirr.2, prim3, ...). A pnmrtive descnptor 
block 308 may provide attributes common to a pnmrtive 
(e.g., texture and connectivity date which may be drrect 
or indexed). Each primitive within primitive M : 302 
indexes corresponding vertices within a vertex list 304. 
A single vertex within vertex list 304 may be used by 
multiple primitives within primitive list 302. if desired, 
primitive list 302 may be implied rather than explrcrt - 
1 e vertex list 304 can be ordered in such a way as to 
define corresponding primitives by implication (e.g., 
using triangle strips). 

[00391 A vertex descriptor block 306 may be pro- 
vided for each vertex within vertex list 304. Vertex 
descriptor block 306 includes attribute data correspond- 
ing to a particular vertex (e.g., rgb or other color data, 
alpha date, xyz surface normal data). As shown in Fig- 
ure 2. vertex descriptor block 306 may comprise a 
number of different indexed component blocks. The ver- 
tex attribute descriptor block 306 defines which vertex 
attributes are present, the number and size of the com- 
ponents, and how the components are referenced (e.g., 
either direct - that is, included within the quantized ver- 
tex data stream - or indexed). In one example, the ver- 
tices in a DRAW command for a particular pnmrtive all 
have the same vertex attribute date structure format 
[00401 Figure 3A shows an example list of attnbutes 
provided by vertex attribute block 306. The following 
attributes may be provided: 
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Attribute 



Position 



Normal 



Diffused Color 



Specular Color 



55 
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Texture 0 Coordinate 
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(continued) 
Attribute 
Texture 1 Coordinate 
Texture 2 Coordinate 5 
Texture 3 Coordinate 
Texture 4 Coordinate 

Texture 5 Coordinate w 
Texture 6 Coordinate 
Texture 7 Coordinate 

[0041] In this example vertex attribute descriptor 1S 
block 306, the position attribute is always present, may 
be either indexed or direct, and can take a number of 
different quantized, compressed formats (e.g., floating 
point, 8-bit, Integer, or 16-bit). All remaining attributes 
may or may not be present for any given vertex, and 20 
may be either indexed or direct as desired. The texture 
coordinate values may, like the position values, be rep- 
resented in a variety of different formats (e.g., 8-bit inte- 
ger, 1 6-bit integer or floating point), as can the surface 
normal attribute. The diffused and specular color 2 s 
attributes may provide 3 (rgb) or 4 (rgba) values in a 
variety of formats including 16-bit threes-complement, 
24-bit threes-complement, 32-bit threes-complement, 
or 1 6-, 24- or 32-bit fours-complement representations). 
All vertices for a given primitive preferably have the 30 
same format. 

[0042] f v In, this example, vertex descriptor 306 refer- 
ences indexed data using a 16-bit pointer into an array 
of attributes.: A particular offset used to access a partic- 
ular attribute within the array depends upon a number of 35 
factors including, e.g., the number of components in the 
attribute; the size of the components, padding between 
attributes for alignment purposes; and whether multiple 
attributes are interleaved in the same array. A vertex 
can have direct and indirect attributes intermixed, and 40 
some attributes can be generated by the hardware (e.g., 
generating a texture coordinate from a position). Any 
attribute can be sent either directly or as an index into 
an array. Vertex cache 212 includes sufficient cache 
lines to handle the typical number of respective data 45 
component streams (e.g., position, normal, color and 
texture) without too many cache misses. 

Vertex Cache Implementation 

50 

[0043] Figure 4 shows an example schematic dia- 
gram of vertex cache 212 and associated logic. Vertex 
cache 212 in this example includes an 8-Kilobyte cache 
memory 400 organized as a 512 x 1 12-bit dual ported 
RAM. Since there are multiple attribute streams being 55 
looked up in the cache 212, an eight set-associative 
cache including eight tag fines 402 is used to reduce 
thrashing. Each tag line includes a 32 x 16 bit dual 



ported tag RAM 404 and associated tag status register 
406. Tag RAMS 404 store the main memory address of 
the corresponding data block stored within vertex RAM 
400. Address calculation block 408 determines whether 
necessary vertex attribute data is already present within 
vertex RAM 400 - or whether an additional fetch to 
main memory is required. Cache lines are prefetched 
from main memory 104 to hide memory latency. Data 
required to process a particular component is stored 
within a queue 410 having a depth that is proportional to 
memory latency. 

[0044] Figure 5 shows an example memory 
address format provided by vertex stream parser 208 to 
vertex cache 212. This memory address 460 includes a 
field 452 providing a byte offset into a cache line; a tag 
RAM address 454; and a main memory address for 
comparison with the contents of tag RAMs 404. 
Address calculation block 408 compares the main 
memory address 456 with the tag RAM 404 contents to 
determine whether the required data is already cached 
within vertex RAM 400, or whether it needs to be 
fetched from main memory 104. 
[0045] The tag status registers 406 store data in the 
format shown in Figure 6. A "data valid" field 462 indi- 
cates whether the data in that particular cache line is 
valid. A counter field 464 keeps track of the number of . 
entries in queue 410 that depend on the cache line. 
Counter field 464 is used in the case that all tag status 
registers 406 show "data valid" if a miss occurs. 
Address calculation block 408 then needs to throw one 
of the cache lines out to make room for the new entry. If 
counter field 464 is not zero, the cache line is still in use 
and cannot be thrown away. Based on a modified partial 
LRU algorithm, address calculation block 408 selects 
one of the cache lines for replacement. The "data valid 0 
field 462 is set to "invalid", and the cache line is 
replaced with a new contents from main memory 104. If 
another attribute index maps to the same cache line, the 
counter field 464 is incremented. Once the data arrives 
from main memory 104, the "data valid" bit is set and an 
entry can be processes from queue 41 0. Otherwise, the 
processing of queue 410 will be stalled until the data 
gets into the cache RAM 400. Once the cache RAM 400 
is accessed for the queue entry, counter 464 decre- 
ments. 

[0046] While the invention has been described in 
connection with what is presently considered to be the 
most practical and preferred embodiment, it is to be 
understood that the invention is not to be limited to the 
disclosed embodiment, but on the contrary, is intended 
to cover various modifications and equivalent arrange- 
ments included within the scope of the appended 
claims. 

Claims 

1. In a 3D videographics system including a mem- 
ory storing polygon vertex data, and a 3D graphics 
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engine that generates and displays images at least 
in part in response to said polygon vertex data, an 
improvement comprising a vertex cache arrange- 
ment operatives coupled to said memory and to 
said 3D graphics engine, said vertex cache 
arrangement caching said vertex data from said 
memory for use by said 3D graphics engine, 
wherein said polygon vertex data includes an 
indexed vertex data representation, and said vertex 
cache arrangement operates in response to said 
indexed vertex data representation to cache 
indexed vertex data and make it available for effi- 
cient access by the 3D graphics engine without 
requiring explicit resorting of said polygon vertex 
data for image display. 
2 A vertex cache arrangement as in claim 1 
wherein 3D graphics engine is disposed on an inte- 
grated circuit, and said vertex cache arrangement 
comprises a memory device disposed on said inte- 
grated circuit and operatively coupled to said 3D 
graphics engine. , 

3. A vertex cache arrangement as in claim z 
wherein said memory device comprises a set-asso- 
ciative cache memory for caching said vertex data. 

4. A vertex cache arrangement as in claim 3 
wherein said set-associative cache memory pro- 
vides eight cache lines. 

5. A vertex cache arrangement as in claim 2 
wherein said memory device comprises an 8 kilo- 
byte low-latency random access memory. 

4. A vertex cache arrangement as in claim 1 includ- 
ing plural cache tag lines. 

5. A vertex cache arrangement as in claim 1 includ- 
ing plural cache tag status registers. 

6. A vertex cache arrangement as in claim 1 includ- 
ing a queue and a miss queue. 

7 A vertex cache arrangement as in claim 1 
wherein said vertex cache arrangement is coupled 
to said memory and, in use, fetches vertex data 
from said memory as said vertex data is needed by 
said 3D graphics engine. 

8 A vertex cache arrangement as in claim 1 further 
Including a hardware-based inverse quantizer oper- 
atively coupled between said vertex cache and said 
3D graphics engine, said inverse quantizer convert- 
ing plural vertex data formats into a uniform float- 
ing-point format for consumption by said 3D 
graphics engine. 

9 A vertex cache arrangement as in claim 1 
wherein said indexed vertex data representation 
comprises an indexed array referencing attribute 
data. 

10. A vertex cache arrangement as in claim 9 
wherein said indexed array directly references said 
attribute data. . 
11 A vertex cache arrangement as in claim 9 
wherein said indexed array indirectly references 
said attribute data. 
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12 In a 3D videographics system including a mem- 
ory storing polygon vertex data, and a 3D graphics 
engine that generates and displays images at least 
in part in response to said polygon vertex data, an 
improvement comprising a method for accessing 
said vertex data from said memory for use by said 
3D graphics engine, said method including the 
steps of representing said polygon vertex data in an 
indexed vertex data representation, and caching 
said indexed vertex data in a low-latency cache 
memory device local to said 3D graphics engine to 
make said indexed vertex data available for efficient 
access by the 3D graphics engine. 

13 A method as in claim 12 further including the 
step of fetching vertex data from said memory to 
said cache memory device as said vertex data is 
needed by said 3D graphics engine. 

14 A method as in claim 12 further including the 
step of converting plural vertex data formats stored 
in said cache memory device into a uniform float- 
ing-point format for consumption by said 3D graph- 
ics engine. 
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(54) Vertex cache for 3D computer graphics 

(57) In a 3D interactive computer graphics system 
such as a video game display system, polygon vertex 
data is fed to the display engine via a vertex cache used 
to cache and organize indexed primitive vertex data 
streams. The vertex cache may be a small, low-latency 
cache memory local to the display engine hardware. 
Polygons can be represented as indexed arrays, e.g., 
indexed linear lists of data components representing 
some feature of a vertex (for example, positions, colors, 
surface normals, or texture coordinates). The vertex 
cache can fetch the relevant blocks of indexed vertex 
attribute data on an as-needed basis to make it available 
to the display processor -- providing spatial locality for 
display processing without requiring the vertex data to 
be prestored in display order. Efficiency can be in- 
creased by customizing and optimizing the vertex cache 
and associated tags for the purpose of delivering verti- 
ces to the graphics engine - allowing more efficient 
prefetching and assembling of vertices than might be 
possible using a general-purpose cache and tag struc- 
ture. 
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